Search CORE

2 research outputs found

Transfer Learning for Speech Recognition on a Budget

Author: Johannsmeier Jens
Kirsch Louis
Krug Andreas
Kunze Julius
Kurenkov Ilia
Stober Sebastian
Publication venue
Publication date: 01/01/2017
Field of study

End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network's weights were sufficient for good performance, especially for inner layers.Comment: Accepted for 2nd ACL Workshop on Representation Learning for NL

arXiv.org e-Print Archive

Crossref

Analyzing and Visualizing Deep Neural Networks for Speech Recognition with Saliency-Adjusted Neuron Activation Profiles

Author: Andreas Krug
Jens Johannsmeier
Jost Alemann
Maral Ebrahimzadeh
Sebastian Stober
Publication venue: 'MDPI AG'
Publication date: 01/06/2021
Field of study

Deep Learning-based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain a better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, several introspection methods have been proposed. However, established introspection techniques are mostly designed for computer vision tasks and rely on the data being visually interpretable, which limits their usefulness for understanding speech recognition models. To overcome this limitation, we developed a novel neuroscience-inspired technique for visualizing and understanding ANNs, called Saliency-Adjusted Neuron Activation Profiles (SNAPs). SNAPs are a flexible framework to analyze and visualize Deep Neural Networks that does not depend on visually interpretable data. In this work, we demonstrate how to utilize SNAPs for understanding fully-convolutional ASR models. This includes visualizing acoustic concepts learned by the model and the comparative analysis of their representations in the model layers

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals